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Abstract. By a method inspired of the Stein's method, we derive an upper-bound 
of the Rubinstein distance between two absolutely continuous probability measures on 
configurations space. As an application, we show that the best way to approximate 
a Modulated Poisson Process (see below for the definition) by a Poisson process is to 
equate their intensity. 



1. Introduction 

According to the Kantorovitch approach, the optimal transportation problem or Monge- 
Kantorovitch problem (MKP for short) reads as follows: given two probability measures 
^ and v on a. Polish space X and a cost function con X x X, does there exist a probability 
measure ^ on X x X which minimizes / c d/? among all probability measures P on X x X 
with first (respectively second) marginal ^ (respectively v) ? The first step is to determine 
whether or not there exists a probability measure 7 such that / c d7 is finite. In the solved 
few criterion are known. The oldest one (see [E]), for quadratic cost, stands that 
such a measure exists provided that ^ and v have finite second moments. Still for the 
quadratic cost, if is a Gaussian measure on a finite or infinite dimensional space and 
ly — Lfi, then the distance is finite whenever L has a finite Boltzmann entropy [9]. In 
the same reference, a bound may also be found for the Rubinstein distance, i.e., when c 
is a distance function. We are interested in the evaluation of the Rubinstein distance on 
configurations space, i.e., for locally finite point processes. The first point to be stressed 
is that we have several reasonable distance between configurations. To name only the 
two we will investigate here: there is the total variation distance when configurations 
are viewed as atomic measures, and there also is a distance with a greater geometric 
fiavor, which is defined in |[8l). To these two distances correspond two different notions of 
Lipschitz continuous functions. This is of some great importance since the Kantorovitch- 
Rubinstein duality allows us to write the Rubinstein distance as a maximization problem 
on the set of Lipschitz functions, see |(T|). Moreover, it is well known in finite dimension 
that Lipschitz functions are "almost" differentiable and that their differential is bounded. 
It turns out that the two usual gradients introduced on configurations space (see [l].[T6]) 
are the good tools to obtain an analog result on configurations space. Once we have a 
gradient, we usually introduce the divergence (as its adjoint) and then a number operator 
(as the composition of the divergence and the gradient), hence an Ornstein-Uhlenbeck (see 
[4J) semi-group. At this point, it is useful to invoke the Stein's method which is a very 
efficient tool to obtain stochastic bounds notably for point processes (see [6t \5\ [21} [2]1. 
In essence. Stein's method compares the expectations of two distinct random variables 
by "embedding" them in the evolution of an ergodic Markov process and by then looking 
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backward at the evolution of this process, from infinity to time 0. With very sketchy 
notations, imagine that we have two smooth functions a and (i with the same Umit at 
infinity, then q;(0) — /3(0) may be evaluated by computing 

/>oo 

/ \o!{s)~(i'{s)\As. 
Jo 

Controlling the difference of derivatives yields to a bound on the difference at time 0. 
This is exactly the principle at work in Theorem [TJ This method reminds of the so-called 
semi-group method often used in proofs of concentration inequality [T^. In Section 2, 
we present the generic principle of our method and we apply it to the different distances 
on configurations spaces in Section 3. Section 4 is devoted to the application of these 
results on the approximation of Markov modulated Poisson Process (MMPP) by a Poisson 
process. The motivation for this part comes from queueing theory where MMPP are widely 
used because of their versatility, useful to model a wide range of real systems jlH (14^ [TS] 
and of the persistence of their Markovianity. Unfortunately, these processes are affected 
by the curse of dimensionality: it is often the case that we must invert linear systems with 
a so huge number of variables it becomes unfeasible. It is thus of crucial importance to 
reduce the cardinality of state space. The extreme situation is when this space is reduced 
to one point, i.e., when an MMPP is a Poisson process. We find by the method developed 
in the beginning of this paper, a bound on the Rubinstein distance between an MMPP 
and a Poisson process. Optimizing this bound yields to the well known rule of thumb 
which consists in taking as the optimal intensity, the average intensity of the MMPP. Our 
result is not then astoundingly original but it shows that by proceeding along this line, 
we can control the error for any functional such as loss probability or others. 

2. Generic scheme 

Let X be a Polish space and d a lower-semi-continuous distance function on X x X, 
which does not necessarily generate the topology on X. We will denote by ci — Lip^ the 
set of Lipschitz continuous F from X to R with Lipschitz constant m: 

\F{x)-Fiy)\ < mdix,y), 

for any {x, y) € . For two probability measures /i and ly on X, the optimal transporta- 
tion problem associated to d consists in evaluating 

Xiin, = inf / d{x,y) d-y^x, y), 

where E(^, f) is the set of probability measures on X x X with first (respectively second) 
marginal /i (respectively ly) . According to |81 12Qj , this minimum is equal to 

(1) v) = sup [ F d(Ai - ly). 

We consider the situation where X = Fa is the configurations space on a Lusin space A, 
i.e., 

Ta = {?7 C A; 1] n K is a finite set for every compact K C A}. 
We identify 77 S Fa and the positive Radon measure J2x£n^^' Throughout this paper. Fa 
is endowed with the vague topology, i.e., the weakest topology such that for all / G Co 
(continuous with compact support on A), the maps 
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are continuous. When / is the indicator function of a subset B, we will use the shorter 
notation ry(-B) to denote the integral of Is with respect to We denote by B{Ta) the 
corresponding Borel cr-algebra. The probability space under consideration will then be 
(Fa, B(Ta), /i). We need some additional structure. 

Hypothesis 1. Assume now that we have : 

• a kernel Q on X x A, i.e., such that Q{A, .) is measurable as function on A for 



any A G l3{rA_) and Q{., s) is a a— finite measure on X for any s G A, 

a map V, defined on a subset Dom V of L'^{p), such that, for any F G Dom V, 



/ / |VsFpQ(w, ds) dn{uj) < +00. 

Jx J A 



We say that a process s) belongs to Dom^ whenever, there exists a constant c 
independent of F such that for any F G Dom V, 



IE 



VsFu{s) Q{uj, ds) 



<4F\\L-{n)- 



For such a process u, we define Su by 



(2) / / \7,F{uj)u{uj, s) Q{uj, ds) d/x(w) = / F du dfj.. 

Jx J A Jx 

Definition 1. We say that (V, Q) has the Rademacher property whenever F £ d — Lip^^ 
implies VF G Dom V and 

(3) \^sF\ < 1, Q{^, ■)d^- almost- surely. 
Consider for F G DomV, the (formal) equation 

(4) = -5VXt, Xo = F 

If this equation has one and only one solution for each F G DomV, we then have a 
^-self-adjoint semi-group (Pt, t > 0), usually called the Ornstein-Uhlenbeck semi-group: 
PtF = Xt where Xt is the solution of (j4]). 

Definition 2. The Ornstein-Uhlenbeck is said to be ergodic whenever Yant^+oo PtF = 
Ix P '^M- 

Theorem 1. Assume that hypothesis[I\ holds. Let v be another probability measure on X 
absolutely continuous with respect to /i. We denote by L the Radon-Nikodym derivative of 
V with respect to fi. If (V, Q) has the Rademacher property and if the Ornstein-Uhlenbeck 
semi-group is ergodic then: 

r r + cc 

(5) Td{^i, v)< / \VsPtL\ dtQiuo, ds) dii{cj). 

JxxA Jo 

Proof. According to the fundamental Lemma of analysis, 

f f+°^ d 

-\-oo 

6VPtF dt, 
PtSVF dt. 
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Since dv/d^ — L, 



X 



F dfi- I F diy - 

X 



X \JX 



F dfi-F] di/ 



X \Jo 



PtSVF At du 



/ / PtSVFL dt d^i 
Jx Jo 

/• r+oo 

/ / VsF VsPtL dtQiu;, ds) d^iiij). 

JxxA Jo 



Since (V, Q) has the Rademacher property, we have an — bound which yields 

to im. □ 

3. Instantiations on Poisson space 

Let p be a tr-finite measure on A and assume that is the Poisson measure of intensity 
p, i.e., the probabiHty measure on Fa fully characterized by 



E 



expijjdv) = exp(^ (e-^(^) - l) dp{s)). 



3.1. Discrete gradient on Poisson space. For F : X ^ R, the discrete gradient of F, 
denoted by \/^F, is defined by 

WlF(r^)^F{r^ + es)~F(7j). 

We set Q{uj, ds) = dp{s) so that DomV" is defined as the set of functionals such that 



E 



/ NiFf dp{s) 
Ja 



< +00. 



We denote by its adjoint in the sense of The rt-th iterated integral of a symmetric 
function / from A" to M is defined as 

Jn{f)^n\ J ■ J /(si, • • • ,s„) d(77 - p)(si) . . . d(77 - p)(s„). 

0<Si<S2<---<S„ 

For a general function /, 

'^£'5" 0<si<S2< - <s„ 

It is well known [l9l [16] that any square integrable functional on Fa can be written as 

+00 

F=J2Mfnl 

n=0 

where for any integer n, /„ is symmetric and belongs to L^(/ci®("^) and that 



+ 00 



ViF(r;) = ^nJ„_i(/„(., s)). 
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Moreover, the Ornstein-Uhlenbeck semi-group operates on chaos as: 

+00 

(6) P/F=^e-"*J„(/„) 

From ^ and by dominated convergence, it is then easily seen that P'^ is ergodic. We now 
choose the total variation as the distance of interest on Fa, i.e., 

di{r], uj) ^2 sup \ri{A) - uj{A)\ 

Lemma 1. For the distance c?i on X , (V', Q) has the Rademacher property. 

Proof. Consider F ^ di — Lipj^, by the very definition of the gradient: 

\V\F{1^)\ < \F{7j + e,)-F{f^)\ 
< di(?7 + es, rj) = 1. 

In the converse direction, consider oj and 77 be two locally finite but not finite configura- 
tions. If di{u!, rf) = -l-cx), there is nothing to prove. If c?i(tj, rf) is finite, WA77 and r]Auj 
are finite, where uj/S.rj = lo\{lo n rf). Since |V|F(?7)| < 1, we get: 

|F(77) - F{uj)\ < |F(?7 n U TyAtj) - F{ri r\bj)\ + \F{ri TMo \J luAt]) ~ F{ri n uj)\ 

< (r/Ac^)(A) + (wAr;){A) 

< 2max((77Aw)(A), (a;Ar7)(A)) 
= di{f], uj). 

The Rademacher property is then established for (V", Q). □ 

Theorem 2. Let /i and v two probability measures on Fa such that du ^ L rf/i. We 
have, 



f |(Id+£«)-V«L| dpis) 

J A 



where C) = (5«V«. 

Proof. It is easily seen using chaos decomposition that 

V|p/f = e-*P/v|F for all s G A, for alH G B 
and it is a general property of semi-groups and their generator that 

r+oc 

/ e-'PfP dt= {Id+C^y^F, 
Jo 

for any F : fl ^ R. We then infer from ([5]) that 

/ I / e~*P/v»L dt\ dp{s) 

J A Jo 

|(Id+£«)-i(V«L)|dp(.) 



□ 

Remark 1. Note that the very analog of this inequality on Wiener space was proved by 
a different though related way in [lOj. 
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3.2. Derivation on Poisson space. In this section we introduce another stochastic 
gradient on Fa which is a derivation - see Let V{A) be the set of C°° vector fields on 
A and Vb(A) C V{A), the subset consisting of all vector fields with compact support. For 
V e Vb(A), for any x € A, the curve 

t^V^ix)eA 

is defined as the solution of the following Cauchy problem 
(7) 

The associated flow (V^", t £ E) induces a curve {V^)*v = V° t e R, on Ta: If 

T] — J2x<£ri ^'J^ then {VtY"'] ~ J2x£7-i ^v^{x)- We are then in position to deflne the notion of 
differentiability on Fa. A measurable function F : Fa ^ M is said to be differentiable if 
for any v G Vo(A), the following limit exists: 

l\mt-' {FiVnri)) - F{r,)) . 

We then denote Vi,F{r/) the preceding limit. We denote by V^i^ we corresponding gradi- 
ent. It verifles : 



V^^FiLu) = / V^FicuMs) dcois). 

J A 

The square norm of \/^F is given by 

/ V^F dc.(s), 

J A 

SO that we are in the framework of Hypothesis 1 if we take 

Q{lli, ds) — duj{s) — Exids), 

where Sa is the Dirac mass in a. For a random variable F : Fa — * K, and random process 
u : Fa X A ^ M, we deflne the adjoint operator of denoted by 5"^ by: 

E [< V^F, u >i2(p)] = E [FS'^u] , 

provided both sides exist, i.e., 

|E[< V^F, u>i.(,)]| <c||F||2,. 

Consider now = d'^W^ and the associated semi-group semigroup {Pi^t e M}. The 
distance of interest is here the Wassertein's distance (see [Tl I18j): 

(8) d2{r]i,r]2) = inf |y do(a;, y)d/3(x, y), /3 G F,,,,^^ | 

where F^^^^^ denotes the set of /? G FaxA having marginals rji and 772. The ergodicity of 
P'^ is proved in [l] and the Rademacher property is the object of [18] . 

On the other hand, there is no known commutation relationships between and 
hence theorem [T] entails that 

Theorem 3. Let /i and v two probability measures on Fa such that dv = L rf/i. We 

have, 



\^''P^L\ dt 
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4. Applications 
4.1. Distance between two Poisson processes. 

Theorem 4. Consider fi and v two Poisson probability measures on A C W\ The 
intensity of ji is the Lebesgue measure on A and that of v is h{s)ds with h G i^(A) 

bound : 



deterministic. Then, we have the fo: 
(9) %A^'^,y) < c 

Proof. According to |12) . 



Li(A) expl^ \\h 



l|li^(A))- 



L{uj) = ^(t^) ^ expf / \nh{s) dw(s) - / {h{s) - 1) ds) 
^Ja Ja ' 

By definition, 

V^L(w) = L(uj + £,) - L{uj) = /i(s)L. 
Hence, according to Theorem [21 

rdAp^.y) < c||/i|Ui(A)E[|(/ + /:«)-iL|]. 

It is then well known, using for instance the chaos decomposition that 

E[\iI + C^)-'L\] <E[L'Y^' 

and 



cxp(2 J lnh{s) dLu{s) - 2 j {h{s) - 1) ds 

- 1) ds) 



E exp(^y \nh^{s) dcj(s) 
exp( / {h{s) - if ds). 



exp( / (/i(s) - If ds) 



The proof is thus complete. 



□ 



4.2. Distance between a Poisson process and a Markov modulated Poisson 
process. In this section we calculate a bound for the distance between a Poisson Process 
and an Markov modulated Poisson process (MMPP for short). 

Definition 3. Consider J an irreducible continuous time Markov chain with finite state 
space. We denote by mj the finite number of states of J, Qj the infinitesimal generator 
of J, TTj the stationary vector of Qj. We assume also that we are given (Ai, • • • , Xmj), 
mj non-negative real numbers. We denote by ^' the map which sends i G {1, • • • , rnj} to 
Ai. An MMPP{J,'^) is a point process the intensity of which is given by '^{Js)ds. This 
means that when J is in state ( called a phase) k (1 < k < mj) then the arrivals occurs 
according to a Poisson process of rate Afc. 

A detailed description of the MMPP with an emphasis on applicability to modeling is 
given in [14] and references therein. 

Theorem 5. Let ^ be a Poisson process of intensity A and v be a MMPP{J, ^) then for 
a given T G M"'', 

A ds exp 



(10) 



A 



A 



1 



A ds 
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Proof. By the Girsanov formula we have: 
di'T 

111!! — pncni I in — , „ , 

A ^ Wo ' A 
Consider g^J = cr{Js, s < T}, the history of J up to time T. Now, 



Lt{uj) = -j^lt^) = exp( 



Td, (mt, i^t) = sup (E^^ [F{Lt - 1)]) 

= sup (E^, [E[F(Lj.-l)|gJ]]) 



<E„ 



sup (E[F(LT-l)|gJ]) 



But conditioning on J, the intensity is deterministic so according to Theorem [9l 



TdiiPT^vx) = E 
which ends the proof. 



A ds exp 



A ds 



□ 



One can then try to determinate the nearest Poisson process to a given MMPP. For, we 
seek for Xopt such that the upper bound of (flOl) is minimal. It is clear that this minimum 
is directed by the exponential part of the expression. It is thus enough to minimize 

2 

A ds 



7r(i) 



f 




Jo 


A 



It is well known, that for large T, we have 



4>.(A) AT ^ 

i=l 



A 



- 1 



which is minimal for A = A 



E 



opt 



^opt 



Z]i=i Aj7r(i). Moreover, 
^-^ (A. - 



Aopt7r(i) 



Aopt)^ 



1=1 

Vopt 
Xopt 



Tr{i) 



^opt 



which is known in queueing theory as the burstiness of the MMPP. Finally the distance 
between an MMPP and the Poisson process of intensity equal the mean arrival rate of the 
MMPP is bounded by 

XoptT exp(— ^T). 

^ '^^opt ' 

In queueing theory, the choice of \opt as X^ili Ai is imposed by "load" conservation: 
one can only compare queueing systems with the same load, i.e., the load (or traffic) is 
defined as the product of the mean arrival rate and of the mean service time. Our result 
shows that this choice is likely to be the optimal one. Moreover, we are now in position to 
evaluate precisely the error due to this approximation. Our bound gives a qualitative basis 
for the experimental rule that not only the load was important to evaluate performance 
of queueing system but also the so-called burstiness was to be taken into account. 
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